14 research outputs found

    Analysis of affine motion-compensated prediction and its application in aerial video coding

    Get PDF
    Motion-compensated prediction is used in video coding standards like High Efficiency Video Coding (HEVC) as one key element of data compression. Commonly, a purely translational motion model is employed. In order to also cover non-translational motion types like rotation or scaling (zoom) contained in aerial video sequences such as captured from unmanned aerial vehicles, an affine motion model can be applied. In this work, a model for affine motion-compensated prediction in video coding is derived by extending a model of purely translational motion-compensated prediction. Using the rate-distortion theory and the displacement estimation error caused by inaccurate affine motion parameter estimation, the minimum required bit rate for encoding the prediction error is determined. In this model, the affine transformation parameters are assumed to be affected by statistically independent estimation errors, which all follow a zero-mean Gaussian distributed probability density function (pdf). The joint pdf of the estimation errors is derived and transformed into the pdf of the location-dependent displacement estimation error in the image. The latter is related to the minimum required bit rate for encoding the prediction error. Similar to the derivations of the fully affine motion model, a four-parameter simplified affine model is investigated. It is of particular interest since such a model is considered for the upcoming video coding standard Versatile Video Coding (VVC) succeeding HEVC. As the simplified affine motion model is able to describe most motions contained in aerial surveillance videos, its application in video coding is justified. Both models provide valuable information about the minimum bit rate for encoding the prediction error as a function of affine estimation accuracies. Although the bit rate in motion-compensated prediction can be considerably reduced by using a motion model which is able to describe motion types occurring in the scene, the total video bit rate may remain quite high, depending on the motion estimation accuracy. Thus, at the example of aerial surveillance sequences, a codec independent region of interest- ( ROI -) based aerial video coding system is proposed that exploits the characteristic of such sequences. Assuming the captured scene to be planar, one frame can be projected into another using global motion compensation. Consequently, only new emerging areas have to be encoded. At the decoder, all new areas are registered into a so-called mosaic. From this, reconstructed frames are extracted and concatenated as a video sequence. To also preserve moving objects in the reconstructed video, local motion is detected and encoded in addition to the new areas. The proposed general ROI coding system was evaluated for very low and low bit rates between 100 and 5000 kbit/s for aerial sequences of HD resolution. It is able to reduce the bit rate by 90% compared to common HEVC coding of similar quality. Subjective tests confirm that the overall image quality of the ROI coding system exceeds that of a common HEVC encoder especially at very low bit rates below 1 Mbit/s. To prevent discontinuities introduced by inaccurate global motion estimation, as may be caused by radial lens distortion, a fully automatic in-loop radial distortion compensation is proposed. For this purpose, an unknown radial distortion compensation parameter that is constant for a group of frames is jointly estimated with the global motion. This parameter is optimized to minimize the distortions of the projections of frames in the mosaic. By this approach, the global motion compensation was improved by 0.27dB and discontinuities in the frames extracted from the mosaic are diminished. As an additional benefit, the generation of long-term mosaics becomes possible, constructed by more than 1500 aerial frames with unknown radial lens distortion and without any calibration or manual lens distortion compensation.Bewegungskompensierte Prädiktion wird in Videocodierstandards wie High Efficiency Video Coding (HEVC) als ein Schlüsselelement zur Datenkompression verwendet. Typischerweise kommt dabei ein rein translatorisches Bewegungsmodell zum Einsatz. Um auch nicht-translatorische Bewegungen wie Rotation oder Skalierung (Zoom) beschreiben zu können, welche beispielsweise in von unbemannten Luftfahrzeugen aufgezeichneten Luftbildvideosequenzen enthalten sind, kann ein affines Bewegungsmodell verwendet werden. In dieser Arbeit wird aufbauend auf einem rein translatorischen Bewegungsmodell ein Modell für affine bewegungskompensierte Prädiktion hergeleitet. Unter Verwendung der Raten-Verzerrungs-Theorie und des Verschiebungsschätzfehlers, welcher aus einer inexakten affinen Bewegungsschätzung resultiert, wird die minimal erforderliche Bitrate zur Codierung des Prädiktionsfehlers hergeleitet. Für die Modellierung wird angenommen, dass die sechs Parameter einer affinen Transformation durch statistisch unabhängige Schätzfehler gestört sind. Für jeden dieser Schätzfehler wird angenommen, dass die Wahrscheinlichkeitsdichteverteilung einer mittelwertfreien Gaußverteilung entspricht. Aus der Verbundwahrscheinlichkeitsdichte der Schätzfehler wird die Wahrscheinlichkeitsdichte des ortsabhängigen Verschiebungsschätzfehlers im Bild berechnet. Letztere wird schließlich zu der minimalen Bitrate in Beziehung gesetzt, welche für die Codierung des Prädiktionsfehlers benötigt wird. Analog zur obigen Ableitung des Modells für das voll-affine Bewegungsmodell wird ein vereinfachtes affines Bewegungsmodell mit vier Freiheitsgraden untersucht. Ein solches Modell wird derzeit auch im Rahmen der Standardisierung des HEVC-Nachfolgestandards Versatile Video Coding (VVC) evaluiert. Da das vereinfachte Modell bereits die meisten in Luftbildvideosequenzen vorkommenden Bewegungen abbilden kann, ist der Einsatz des vereinfachten affinen Modells in der Videocodierung gerechtfertigt. Beide Modelle liefern wertvolle Informationen über die minimal benötigte Bitrate zur Codierung des Prädiktionsfehlers in Abhängigkeit von der affinen Schätzgenauigkeit. Zwar kann die Bitrate mittels bewegungskompensierter Prädiktion durch Wahl eines geeigneten Bewegungsmodells und akkurater affiner Bewegungsschätzung stark reduziert werden, die verbleibende Gesamtbitrate kann allerdings dennoch relativ hoch sein. Deshalb wird am Beispiel von Luftbildvideosequenzen ein Regionen-von-Interesse- (ROI-) basiertes Codiersystem vorgeschlagen, welches spezielle Eigenschaften solcher Sequenzen ausnutzt. Unter der Annahme, dass eine aufgenommene Szene planar ist, kann ein Bild durch globale Bewegungskompensation in ein anderes projiziert werden. Deshalb müssen vom aktuellen Bild prinzipiell nur noch neu im Bild erscheinende Bereiche codiert werden. Am Decoder werden alle neuen Bildbereiche in einem gemeinsamen Mosaikbild registriert, aus dem schließlich die Einzelbilder der Videosequenz rekonstruiert werden können. Um auch lokale Bewegungen abzubilden, werden bewegte Objekte detektiert und zusätzlich zu neuen Bildbereichen als ROI codiert. Die Leistungsfähigkeit des ROI-Codiersystems wurde insbesondere für sehr niedrige und niedrige Bitraten von 100 bis 5000 kbit/s für Bilder in HD-Auflösung evaluiert. Im Vergleich zu einer gewöhnlichen HEVC-Codierung kann die Bitrate um 90% reduziert werden. Durch subjektive Tests wurde bestätigt, dass das ROI-Codiersystem insbesondere für sehr niedrige Bitraten von unter 1 Mbit/s deutlich leistungsfähiger in Bezug auf Detailauflösung und Gesamteindruck ist als ein herkömmliches HEVC-Referenzsystem. Um Diskontinuitäten in den rekonstruierten Videobildern zu vermeiden, die durch eine durch Linsenverzeichnungen induzierte ungenaue globale Bewegungsschätzung entstehen können, wird eine automatische Radialverzeichnungskorrektur vorgeschlagen. Dabei wird ein unbekannter, jedoch über mehrere Bilder konstanter Korrekturparameter gemeinsam mit der globalen Bewegung geschätzt. Dieser Parameter wird derart optimiert, dass die Projektionen der Bilder in das Mosaik möglichst wenig verzerrt werden. Daraus resultiert eine um 0,27dB verbesserte globale Bewegungskompensation, wodurch weniger Diskontinuitäten in den aus dem Mosaik rekonstruierten Bildern entstehen. Dieses Verfahren ermöglicht zusätzlich die Erstellung von Langzeitmosaiken aus über 1500 Luftbildern mit unbekannter Radialverzeichnung und ohne manuelle Korrektur

    Analysis of Affine Motion-Compensated Prediction in Video Coding

    Get PDF
    Motion-compensated prediction is used in video coding standards like High Efficiency Video Coding (HEVC) as one key element of data compression. Commonly, a purely translational motion model is employed. In order to also cover non-translational motion types like rotation or scaling (zoom), e. g. contained in aerial video sequences such as captured from unmanned aerial vehicles (UAV), an affine motion model can be applied. In this work, a model for affine motion-compensated prediction in video coding is derived. Using the rate-distortion theory and the displacement estimation error caused by inaccurate affine motion parameter estimation, the minimum required bit rate for encoding the prediction error is determined. In this model, the affine transformation parameters are assumed to be affected by statistically independent estimation errors, which all follow a zero-mean Gaussian distributed probability density function (pdf). The joint pdf of the estimation errors is derived and transformed into the pdfof the location-dependent displacement estimation error in the image. The latter is related to the minimum required bit rate for encoding the prediction error. Similar to the derivations of the fully affine motion model, a four-parameter simplified affine model is investigated. Both models are of particular interest since they are considered for the upcoming video coding standard Versatile Video Coding (VVC) succeeding HEVC. Both models provide valuable information about the minimum bit rate for encoding the prediction error as a function of affine estimation accuracies. © 1992-2012 IEEE

    Analysis of coding tools and improvement of text readability for screen content

    Full text link
    Abstract—Current video coding standards perform well for video sequences captured by a real camera. The aperture of the camera’s optical system smooths the content and attenuates higher frequencies. New application scenarios, enabled by the growing number of high bit rate internet gateways, however, make it necessary to take a closer look at the efficiency of such standards in handling artificial content. Remote desktop appli-cations for example often include text parts. As a consequence, these content types contain sharp edges or high frequencies, which are considered less important in natural video and are therefore treated less carefully. The frequent result is an increased occurrence of artefacts or the loss of information that is actually important to the user. This paper gives an analysis of such artificially created video sequences, evaluates the performance of current coding tools for this type of content and proposes a simple, yet effective way to maintain readability of text within video material using only well considered encoder control and without the need of large additional modules. I

    Mesh-based global motion compensation for robust mosaicking and detection of moving objects in aerial surveillance

    No full text
    Global Motion Compensation is one of the key technologies for aerial image processing e.g. to detect moving objects on the ground or to generate a mosaick image of the observed area. For this task, it is necessary to estimate and compensate the motion of the pixels between the recorded frames evoked by the movement of the camera. As the camera is statically attached to a flying device such as a quadrocopter (also called Micro Air Vehicle, MAV) or a helicopter, the motion of the camera directly corresponds to the plane movements. For simplification, only a planar landscape model is used nowadays to describe the global motion of the scene. However, if objects like buildings or mountains are close to the camera, i.e. the MAV is at a low altitude, this simplification is not valid. Therefore we propose a more complex model by introducing a 2D mesh-based motion compensation technique, also known as image warping, to compensate the global motion. We show the benefits if used for mosaick creation by smaller artifacts due to perspective distortions and smaller drift problems. We also improve a moving object detection system to identify moving objects more reliably. Moreover, the proposed method is also more robust in case of lens distortions. 1

    Low bit rate roi based video coding for hdtv aerial surveillance video sequences

    No full text
    For aerial surveillance systems two key features are important. First they have to provide as much resolution as possible, while they secondly should make the video available at a ground station as soon as possible. Recently so called Unmanned Aerial Vehicles (UAVs) got in the focus for surveillance operations with operation targets such as environmental and disaster area monitoring as well as military surveillance. Common transmission channels for UAVs are only available with small bandwidths of a few Mbit/s. In this paper we propose a video codec which is able to provide full HDTV (1920 × 1080 pel) resolution with a bit rate of about 1–3 Mbit/s including moving objects (instead of 8– 15 Mbit/s when using the standardized AVC codec). The coding system is based on an AVC video codec which is controlled by ROI detectors. Furthermore we make use of additional Global Motion Compensation (GMC). In a modular concept different Region of Interest (ROI) detectors can be added to adjust the coding system to special operation targets. This paper presents a coding system with two motion-based ROI detectors; one for new area detection (ROI-NA) and another for moving objects (ROI-MO). Our system preserves more details than an AVC coder at the same bit rate of 1.0 Mbit/s for the entire frame. 1

    Rate-Distortion Theory for Affine (Global) Motion Compensation in Video Coding

    Get PDF
    Presentation at the SVCP 201

    Mesh-based piecewise planar motion compensation and optical flow clustering for ROI coding

    Get PDF
    For the transmission of aerial surveillance videos taken from unmanned aerial vehicles (UAVs), region of interest (ROI)-based coding systems are of growing interest in order to cope with the limited channel capacities available. We present a fully automatic detection and coding system which is capable of transmitting high-resolution aerial surveillance videos at very low bit rates. Our coding system is based on the transmission of ROI areas only. We assume two different kinds of ROIs: in order to limit the transmission bit rate while simultaneously retaining a high-quality view of the ground, we only transmit new emerging areas (ROI-NA) for each frame instead of the entire frame. At the decoder side, the surface of the earth is reconstructed from transmitted ROI-NA by means of global motion compensation (GMC). In order to retain the movement of moving objects not conforming with the motion of the ground (like moving cars and their previously occluded ground), we additionally consider regions containing such objects as interesting (ROI-MO). Finally, both ROIs are used as input to an externally controlled video encoder. While we use GMC for the reconstruction of the ground from ROI-NA, we use meshed-based motion compensation in order to generate the pelwise difference in the luminance channel (difference image) between the mesh-based motion compensated and the current input image to detect the ROI-MO. High spots of energy within this difference image are used as seeds to select corresponding superpixels from an independent (temporally consistent) superpixel segmentation of the input image in order to obtain accurate shape information of ROI-MO. For a false positive detection rate (regions falsely classified as containing local motion) of less than 2. we detect more than 97. true positives (correctly detected ROI-MOs) in challenging scenarios. Furthermore, we propose to use a modified high-efficiency video coding (HEVC) video encoder. Retaining full HDTV video resolution at 30 fps and subjectively high quality we achieve bit rates of about 0.6-0.9Mbit/s, which is a bit rate saving of about 90. compared to an unmodified HEVC encoder
    corecore